HTTrack Programming page
HTTrack can be used as a third-party program in batch files, or as library (see in the lib/ directory).
We will see here several examples, written in batch script (can be adapted to almost all batch script languages) or in C.
How to get one single file
httrack --get http://localhost/
|
How to get one single file and pipe it to stdout
httrack --quiet --get http://localhost/ -O tmpget -V "cat \$0" | grep -iE "TITLE"
rm -rf tmpget
|
How to search in all HTML files on a website
httrack --skeleton http://localhost/ -V "if grep -iE \"TITLE\" \"\$0\">/dev/null; then echo \"Match found at \$0\"; fi"
rm -rf tmpget
Same thing but matches only the first file:
httrack --skeleton http://localhost/ -V "if grep -iE \"TITLE\" \"\$0\">/dev/null; then echo \"Match found at \$0\"; kill -9 \$PPID; fi"
rm -rf tmpget
|
Indexing a website, and using the index as a search engine
httrack localhost -%I
Will generate an index.txt file, which contains all detected keywords, sorted and indexed using this format:
keyword
<tab> number_of_hits_in_current_page_for_this_keyword page_location
<tab> number_of_hits_in_current_page_for_this_keyword page_location
<tab> number_of_hits_in_current_page_for_this_keyword page_location
...
=total_number_of_hits_for_this_keyword
((total_number_of_hits_for_this_keyword*1000)/total_number_of_keywords)
|
Example:
abilities
1 localhost/manual/mod/index-2.html
1 localhost/manual/mod/index.html
1 localhost/manual/mod/mod_negotiation.html
=3
(0)
ability
2 localhost/manual/misc/FAQ.html
2 localhost/manual/suexec.html
1 localhost/manual/handler.html
1 localhost/manual/misc/security_tips.html
1 localhost/manual/mod/mod_rewrite.html
1 localhost/manual/mod/mod_setenvif.html
1 localhost/manual/multilogs.html
1 localhost/manual/netware.html
1 localhost/manual/new_features_1_3.html
1 localhost/manual/windows.html
=12
(0)
able
4 localhost/manual/dso.html
4 localhost/manual/mod/core.html
3 localhost/manual/dns-caveats.html
3 localhost/manual/mod/mod_auth.html
3 localhost/manual/mod/mod_rewrite.html
3 localhost/manual/upgrading_to_1_3.html
2 localhost/manual/misc/API.html
2 localhost/manual/misc/FAQ.html
2 localhost/manual/misc/windoz_keepalive.html
2 localhost/manual/mod/mod_auth_db.html
2 localhost/manual/mod/mod_auth_dbm.html
1 localhost/manual/misc/descriptors.html
1 localhost/manual/misc/fin_wait_2.html
1 localhost/manual/misc/security_tips.html
1 localhost/manual/mod/mod_auth_digest.html
1 localhost/manual/mod/mod_cern_meta.html
1 localhost/manual/mod/mod_env.html
1 localhost/manual/mod/mod_example.html
1 localhost/manual/mod/mod_unique_id.html
1 localhost/manual/mod/mod_usertrack.html
1 localhost/manual/stopping.html
1 localhost/manual/suexec.html
1 localhost/manual/vhosts/ip-based.html
1 localhost/manual/windows.html
=43
(0)
...
|
Script example:
search.sh
|
|
|
|